AITopics | base loss

Collaborating Authors

base loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ebea2325dc670423afe9a1f4d9d1aef5-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 23:37:00 GMT

base loss, baseline, experiment, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.32)

Add feedback

raised by multiple reviewers and next respond to individual questions

Neural Information Processing SystemsAug-17-2025, 03:38:09 GMT

We thank all the reviewers for their feedback and pointers to relevant papers. This includes (Kendall et al., 2018), where they learn Kendall et al. 2018), we consider different loss functions on the same output space. There are specific reasons we did not use several multi-task learning algorithms mentioned by REV4 as baselines. Kendall et al. (2018) assumes that all base losses are applications of the same function (max likelihood in this case) We don't see how this method can be extended to our scenario where base losses do not necessarily Moreover, our regularization admits a very different nature. However, directly normalizing the base losses was sufficient for our experiments.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.32)

Add feedback

Flexible risk design using bi-directional dispersion

Holland, Matthew J.

arXiv.org Artificial IntelligenceFeb-16-2023

Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been proposed and studied, but these risks are all at least as sensitive as the mean to loss tails on the upside, and tend to ignore deviations on the downside. We study a complementary new risk class that penalizes loss deviations in a bi-directional manner, while having more flexibility in terms of tail sensitivity than is offered by mean-variance. This class lets us derive high-probability learning guarantees without explicit gradient clipping, and empirical tests using both simulated and real data illustrate a high degree of control over key properties of the test loss distribution incurred by gradient-based learners.

artificial intelligence, machine learning, t-risk, (19 more...)

arXiv.org Artificial Intelligence

2203.14434

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.67)

Add feedback

Student-Teacher Learning from Clean Inputs to Noisy Inputs

Hong, Guanzhe, Mao, Zhiyuan, Lin, Xiaojun, Chan, Stanley H.

arXiv.org Machine LearningMar-12-2021

Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.

probability, student, student-teacher learning, (16 more...)

arXiv.org Machine Learning

2103.076

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback